| Stage | Count | Percentage |
|---|---|---|
| Raw PLUTO records | 858,284 | 100% |
| After BBL normalization | 857,130 | 99.9% |
| Records removed | 1,154 | 0.1% |
STA 9750 Final-Project: A Tale of 59 NYC Community Districts
Introduction
The Overarching and Specific Questions
The COVID-19 pandemic established new urban norms, transforming job proximity from a strict requirement into a flexible option through remote work. This paradigm shift necessitates an analysis of how traditional value determinants adapted to these new dynamics. Our research team’s work addresses the overarching question (OQ):
Did COVID-19 reshape the relationship between neighborhood characteristics and property values across NYC’s CDs?
While the team’s broader effort examines crime, density, job accessbility, and transit, this analysis focuses on Educational Attainment (EA). Historically, high-education neighborhoods commanded significant price premiums from well-known agglomeration effects. However, the pandemic disrupted these patterns, leading to this research’s focus, which seeks to answer the specific question (SQ):
Did the strength of the relationship between neighborhood educational attainment and property values change post-COVID, and did this change differ across NYC’s Community Districts?
Hypotheses
Given the pandemic’s impact on work-life demands and redefining dwelling needs, this question explores two hypotheses.
Hypothesis 1: The positive correlation between education and property values would strengthen post-COVID.
Hypothesis 2: High-education CDs would experience stronger property value growth post-COVID as remote work freed professionals to prioritize neighborhood amenities over commute times.
Beyond testing these hypotheses, this analysis also investigates whether pandemic-era shifts represent temporary disruption or a systemic realignment of urban real estate economics.
Data Acquisition and Processing
This analysis integrates four data sources through programmatic acquisition, requiring careful transformation to align incompatible geographic coding systems across federal and city databases.
Setup and Configuration
Loading required packages and defining global constants allows for programmatically assembling a 59 CD-level panel for 2017–2019 vs. 2021–2023 and merges it with baseline 2019 American Community Survey (ACS) education variables.
NYC Community District Shapefile
The get_nyc_cd() function retrieves the official shapefile from the NYC Department of City Planning, unzips it, and constructs standardized identifiers for all 59 Community Districts.
PLUTO BBL-to-CD Crosswalk
The get_pluto_cd_crosswalk() function uses raw data from NY Open Data to implement a Borough–Block–Lot (BBL) to CD crosswalk. This process normalizes approximately 870,000 BBLs, resolving formatting inconsistencies and creating standardized linkages between individual tax lots and CDs.
This process removed 1,154 invalid entries (0.1%), preventing silent join failures downstream, as shown in Table 1 below.
Department of Finance Rolling Sales Data
The get_dof_sales_year_boro() function automates collection and cleaning Annualized Sales reports from the NYC Department of Finance (DOF). Also, quality filters remove transactions under $10,000, restricting data collection to residential tax classes (e.g., 1, 2, 2A, 2B, and 2C).
Enhanced BBL Matching: Two-Stage Approach
A two-stage join strategy combines exact BBL matching with a block-level fallback approach. As standard exact-match joins lose approximately 25% of transactions due to condo billing BBLs, the fallback allows the function to join the remaining unmatched sales record to its specific city block. This process results in 100% match rate, minimizing inaccuracies in analyses within high-density areas, as shown in Table 2.
| Stage | Count | Percentage |
|---|---|---|
| Total sales (all files) | 339,826 | 100% |
| Stage 1: Exact BBL matches | 246,392 | 72.5% |
| Stage 2: Block-level matches | 93,434 | 27.5% |
| Total matched to CDs | 339,826 | 100% |
| Overall match rate | 100% |
Note: The code block below handles all data acquisition and pre-processing to establish the foundational datasets for this research These background processes include structural validations to ensure data integrity before thorough analysis.
American Community Survey Education Data
Sourced from Table B15003, the American Community Survey (ACS) dataset aggregates educational variables from approximately 2,200 tracts that do not perfectly align well with NYC CD boundaries. Therefore, it is best to use an area-weighted aggregation approach, which is a geographic method that redistributes data between mismatched areas to account for 100% of the population.
\[\text{Total}_{\text{Pop BA+}} = \sum_{\text{tracts}} \left( \text{Tract}_{\text{Pop BA+}} \times \frac{\text{Intersection Area}}{\text{Tract Area}} \right)\]
Treating EA as a fixed baseline is a critical research control. This step is necessary to isolate pure market demand from potential confounding variables. Updating education data post-COVID would obfuscate the source of price changes (e.g., preference changes or population shifts). Consequently, holding education constant at 2019 levels ensures that results yield an accurate measure of changing housing demand.
Table 3 below demonstrates this fixed baseline strategy, showing that the same 2019 ACS education data is consistently applied across all six research years (2017-2023), enabling clean isolation of pandemic-era market shifts.
| Year | Period | ACS Baseline | CDs |
|---|---|---|---|
| 2017 | Pre-COVID | 2019 | 59 |
| 2018 | Pre-COVID | 2019 | 59 |
| 2019 | Pre-COVID | 2019 | 59 |
| 2021 | Post-COVID | 2019 | 59 |
| 2022 | Post-COVID | 2019 | 59 |
| 2023 | Post-COVID | 2019 | 59 |
Temporal Scope and Final Integration
The core of this analysis compares two three-year periods: pre-COVID (2017-2019) and post-COVID (2021-2023).
| Component | Detail |
|---|---|
| Pre-COVID sales period | 2017, 2018, 2019 (3 years) |
| Post-COVID sales period | 2021, 2022, 2023 (3 years) |
| Excluded year | 2020 (pandemic disruption) |
| Education data (baseline) | ACS 2015–2019 (5-year), used as baseline for both periods |
Omitting 2020 is essential to ensure analysis integrity. Given the acute shocks from this year, any statistical anomalies may distort long-term trend analysis; thus, bypassing it yields a clearer view of the market’s post-COVID response.
The final data integration shown in Table 5 confirms critical data merging (e.g., median prices by CD-year) with education baselines held constant at 2019 levels.
| Min Years (CD Period) | Max Years (CD Period) | Min BA+ Values | Max BA+ Values | Min ACS Years | Max ACS Years |
|---|---|---|---|---|---|
| 3 | 3 | 1 | 1 | 1 | 1 |
The approaches highlighted in the Data Acquisition and Processing section creates a balanced panel of 354 CD-year observations, with 59 CDs encompassing six years worth of data. This structure facilitates this research’s Difference-in-Differences (DiD) analysis, ensuring that any identified trends accurately link to post-pandemic shifts.
Pre-COVID Analytical Framework
Creating the Analysis Set
This baseline analysis addresses the OQ by establishing EA as a strong neighborhood predictor pre-pandemic. Quantifying this “education premium” establishes the context for the study to determine whether the pandemic weakened or strengthened the link between EA and property values.
To analyze how these different CDs responded to the pandemic, this research applied a non-parametric, tercile grouping approach, stratifying EAs into “Low,” “Medium,” and “High” tiers to mitigate outlier effects when comparing distributions.
Table 6 shows clear delineation, with Low-education CDs yielding a 19% BA+ Attainment average, while High-education CDs average 53% BA+ Attainment. This variation establishes a tangible baseline for comparing how CD housing markets evolved during the pandemic era.
| Education Group | Number of CDs | Min BA+ (%) | Max BA+ (%) | *Mean BA+ (%) | Median BA+ (%) |
|---|---|---|---|---|---|
| Low | 20 | 11.7 | 27.3 | 19.3 | 19.4 |
| Medium | 19 | 28.5 | 40.2 | 34.1 | 34.5 |
| High | 20 | 40.5 | 82.5 | 59.6 | 52.8 |
This tercile structure enables parallel-trends testing.
Note: The 40.1 percentage point (pp) gap between high and low tercile means (59.5% - 19.4%) will be part of later internal consistency checks in the regression analysis.
Pre-Trend Diagnostics
A DiD design is a favorable approach to filter out factors (e.g., economic shifts and neighborhood characteristics) impacting trends, allowing for a strict focus on the pandemic’s effect. This research evaluates a Parallel-Trends Assumption (PTA) framework, to support a DiD interpretation of genuine structural shift in market behavior rather than pre-existing trends.
Logarithmic Transformation
To meaningfully execute this comparison, this analysis implements a logarithmic transformation of property values. Because High-EA CDs begin at significantly higher baselines, analyzing raw dollars would blur the comparison of trends. This standardization process yields relative appreciation rates, ensuring comparability across terciles. For example, a 0.10 log point shift results in an approximate 10% change in value.
Figure 1 shows how High-EA CDs start at roughly twice the price level of low-education districts. However, the three trajectories rise at similar rates, reinforcing the PTA.

Table 7 below highlights stable growth consistency, with annual growth ranging from 3.0% to 8.2%.
| Education Group | Log-Point Slope | Annual % Growth |
|---|---|---|
| Low | 0.079 | 8.2 |
| Medium | 0.030 | 3.0 |
| High | 0.043 | 4.4 |
Small inter-group differences of 0.049satisfies the PTA requirements, confirming that post-pandemic changes represent structural shifts rather than trajectory continuations.
Education and Value: A Lesson on Geography
NYC’s exceptionally diverse EA landscape requires establishing baseline disparities before testing pandemic impacts, as shown in Table 3 below.
| CDs | Mean | SD | Min | 25th %ile | Median | 75th %ile | Max |
|---|---|---|---|---|---|---|---|
| 59 | 37.7% | 19.9% | 11.7% | 24.4% | 34.5% | 43.9% | 82.5% |
EA varies widely across CDs (SD ≈ 19.8 pp). The 18.9 pp gap between the 25th and 75th percentiles supports tercile grouping, which better accounts for extreme shifts in CD behavior.
Table 4 below further highlights theses disparities, with BA+ Attainment ranging between 11.8% (e.g., BX01, the South Bronx) to a maximum of 82.7% (e.g., MN05, the Upper West Side), representing a 70 pp difference.
| Rank | CD ID | Borough | BA+ Attainment |
|---|---|---|---|
| Top 3 | MN05 | Manhattan | 82.5% |
| Top 3 | MN01 | Manhattan | 81.5% |
| Top 3 | MN08 | Manhattan | 81.1% |
| Bottom 3 | BX01 | Bronx | 11.7% |
| Bottom 3 | BX05 | Bronx | 12.2% |
| Bottom 3 | BX06 | Bronx | 12.9% |
The interactive Leaflet below highlights EA disparities across CDs. Clicking on an individual CD reveals its percentage of EA.
Post-COVID Analysis and Results
The Education Reversal
The post-COVID scatter plot below reveals a weaker but still positive relationship between EA and property values (r = 0.74, down from r = 0.77 pre-COVID).

While high EA neighborhoods maintain an absolute price advantage, the correlation’s decline suggests the education premium diminished during the pandemic years, leading to the rejection of Hypothesis 1.
The regression line’s flatter slope indicates that each additional pp of BA+ Attainment predicts a smaller price differential than in the pre-COVID period. Table 5 provides additional clarity with actual price changes across terciles, revealing which groups appreciated fastest.
| Education Group | CDs | Pre-COVID Median | Post-COVID Median | Change ($) | Change (%) |
|---|---|---|---|---|---|
| Low | 20 | $517,091 | $641,946 | $124,854 | 26.03% |
| Medium | 19 | $720,373 | $813,311 | $92,938 | 15.03% |
| High | 20 | $1,031,182 | $1,127,420 | $96,237 | 11.86% |
| All CDs | 59 | $756,823 | $861,699 | $104,875 | 17.68% |
These findings indicate a remarkable reversal of the traditional education premium. From 2017-19 to 2021-23, Low-education CDs experienced 26.03% median price growth, whereas Medium-education CDs and High-education CDs grew only by 15.03% and 11.86%, respectively. The most notable finding is that High-education CDs grew at less than half the rate of its Low-education counterparts, representing a 14.2% point difference.
This result rejects Hypothesis 2, as Low-education CDs saw the fastest appreciation. The Leaflet map below highlights this appreciation pattern across all CDs.
To contextualize this reversal: a median-priced home in a low-education CD (e.g., BX07) gained approximately $252,500 in value, compared to $5,000 in a high-education CD (e.g. MN07), representing a $247,500 difference attributable to the EA composition of the neighborhood.
The t-test in Table 6 below confirms this pattern.
| Comparison | Difference | 95% CI | t-statistic | df | p-value |
|---|---|---|---|---|---|
| High − Low | -14.2 pp | [-22.9, -5.4] | -3.27 | 37 | 0.002 |
Differences between high and low terciles is statistically significant at p = 0.002, with a 95% confidence interval entirely excluding zero. As a result, this outcome indicates that the pattern is unlikely to have occurred by chance.
Figure 4 below shows the profound magnitude of this reversal.

Non-overlapping confidence intervals confirm distinct economic outcomes between the High and Low EA groups, indicating the 14.2 pp divergence represents structural shifts rather than anomaly.
Parametric Regression Analysis
While terciles demonstrate reversal magnitude, a modified continuous regression quantifies how each incremental BA+ percentage point influenced post-COVID appreciation.
As shown in Table 7, each additional pp of BA+ Attainment predicts 0.376 pp less price growth. This statistically significant (p < 0.001) negative coefficient directly contradicts the pre-COVID pattern, where higher EA predicted higher prices. The relationship has not only weakened, it has reversed.
| Term | Coefficient | Std. Error | 95% CI Lower | 95% CI Upper | t-statistic | p-value |
|---|---|---|---|---|---|---|
| Intercept | 32.006 | 3.340 | 25.317 | 38.695 | 9.58 | < 0.001 |
| BA+ Attainment (%) | -0.380 | 0.078 | -0.537 | -0.223 | -4.84 | < 0.001 |
Comparing pre- and post-COVID models reveals this shift’s extent.
In the pre-COVID period, higher education predicted higher absolute prices, with each pp of BA+ Attainment adding nearly $14,000 to median home values.
- \(\widehat{\text{Median Price}} = \$235,462 + \$13,816 \times \text{BA+\%}\)
In the post-COVID period, this relationship inverted: higher education predicted slower price appreciation.
- \(\widehat{\text{Price Change}} = 31.86\% - 0.376 \times \text{BA+\%}\)
Moreover, the post-COVID regression yields an \(R^2\) of 0.282, indicating that baseline EA alone explains 28.2% of the variation in price appreciation. Although lower than the pre-COVID model, this \(R^2\) yields acceptable explanatory power for a growth metric, confirming that EA remained a primary, yet inverted, driver of market disparity during the pandemic.
| R² | Adjusted R² | Residual SE | F-statistic | p-value |
|---|---|---|---|---|
| 0.291 | 0.279 | 11.89 pp | 23.42 | < 0.001 |
Internal Consistency Check
The tercile-based and regression-based approaches should yield consistent estimates if the education-growth relationship is approximately linear.
From Table 7, each additional pp increase in BA+ Attainment predicts a -0.376 pp change in price growth. Moreover, Table 1 highlighted the average education gap between High and Low terciles to be 40.1 pp.
Using the regression coefficient, it is possible to predict the expected difference:
\[\text{Predicted Difference} = \text{Education Gap} \times \text{Regression Coefficient}\]
\[\text{Predicted Difference} = 40.1 \text{ pp} \times (-0.376) = -15.1 \text{ pp}\]
This means the regression model predicts that High-EA CDs should grow 15.1 pp less than Low-EA CDs.
From Table 5, we observed that Low-EA CDs actually grew 14.2 pp more than High-EA CDs (26.03% - 11.86% = 14.17% ≈ 14.2 pp).
Table 9 compares these two estimates to assess internal consistency.
| Quantity | Value |
|---|---|
| Average education gap (High − Low) | 40.4 pp |
| Regression coefficient (pp per 1% BA+) | -0.38 |
| Predicted DiD from regression | -15.3 pp |
| Observed DiD from tercile table | 14.2 pp |
The regression-based prediction (-15.1 pp, from High’s perspective) closely matches the observed tercile difference (+14.2 pp, from Low’s perspective). This close correspondence (e.g., less than a 7% rate of change) confirms internal consistency between the non-parametric (tercile) and parametric (regression) approaches.
Whether comparing discrete education groups or modeling continuous relationships, the conclusion remains the same: Low-EA CDs experienced substantially faster price growth during the post-COVID period, with the magnitude of this reversal measuring approximately 14-15 pp.
Robustness: Citywide Validation by Borough
The previous sections highlighted stark differences in EA and property values in two boroughs. However, it is essential to examine whether this inverted education-growth relationship holds across all five boroughs, as each contain different demographic compositions and unique pandemic experiences.
| Borough | CDs | Mean Change | SD | Min | Max |
|---|---|---|---|---|---|
| Manhattan | 12 | 6.4% | 15.9% | -16.1% | 45.7% |
| Bronx | 12 | 31.7% | 11.7% | 4.8% | 51.5% |
| Staten Island | 3 | 22.6% | 1.5% | 21.6% | 24.3% |
| Queens | 14 | 16.9% | 11.7% | 2% | 38.2% |
| Brooklyn | 18 | 15.7% | 8.6% | 1.1% | 39.4% |
Table 10 reveals that pandemic-era property value growth occurred across all five boroughs, though growth rates varied. However, there are significant variations between boroughs. Specifically, Manhattan exhibited the highest volatility (\(SD = 15.9\%\))). Queens showed similar variation (\(SD=11.7\%\)), while Brooklyn remained relatively more consistent (\(SD=8.6\%\)).
Table 11 quantifies how EA impacts property values within each borough.
| Borough | CDs | Correlation (r) | Slope (pp per 1% BA+) |
|---|---|---|---|
| Staten Island | 3 | -0.991 | -0.682 |
| Bronx | 12 | -0.346 | -0.476 |
| Manhattan | 12 | -0.389 | -0.298 |
| Queens | 14 | -0.221 | -0.230 |
| Brooklyn | 18 | -0.009 | -0.005 |
There is a consistent negative relationship between EA and price growth across all five boroughs. All boroughs show negative correlations and negative slopes, confirming the pattern extends beyond any single market.
Staten Island (SI), with its three CDs, exhibits the strongest negative relationship (r = -0.990, slope = -0.721).
Brooklyn’s near-zero slope (-0.005) and weak correlation (-0.008), suggesting that other factors (e.g., gentrification) drove appreciation more than EA. Additionally, its close proximity to Manhattan may have likely outweighed EA in determining appreciation. Consequently, Brooklyn’s unique outcome calls for deeper investigation in future research.
Figure 5 visualizes these borough-specific slopes for direct comparison.

This figure confirms that no borough experienced the traditional positive education-growth relationship during the post-COVID period. In four of five boroughs, there was a moderate-to-strong negative relationship, with slopes ranging from -0.223 to -0.721.
Discussion
Interpretation
Several factors likely explain this borough-wide education premium reversal.
Remote work reduced the need for living near employment-rich and amenitiy-filled hubs, usually concentrated in high-EA CDs. Also, affordability pressures led buyers to migrate toward to undervalued CDs in the outer-borough areas, offering greater access to homeownership opportunities. Lastly, lower valuation CDs experienced post-pandemic market correction, leading to catch-up growth, or a donut effect, shifting market demand and creating higher prices in once affordable CDs.
These factors likely worked together in as a systemic feedback loop. As the agglomeration premium of high-EA CDs diminished, market demand shifted toward peripheral, or outer-borough markets. This demand surge created a compounding effect, accelerating appreciation in Low-EA districts while High-EA growth stagnated.
It is essential to clarify that this analysis focuses on highlighting a change in growth rather than hierarchy. High-EA neighborhoods maintained their absolute price dominance throughout the pandemic. The reversal occurred primarily in appreciation rates. However, it remains unclear whether these findings reveal a permanent value change or a temporary disruption.
Contribution to Overarching Question
This analysis demonstrates that COVID-19 reshaped the relationship between neighborhood characteristics and property values by reversing the education premium, which was historically a strong predictor of urban real estate prices. Moreover, this finding connects with the team’s individual analyses of other neighborhood characteristics. This research:
Aligns with transit findings: Both accessibility and education premiums weakened.
Contextualizes density analysis: The apparent density penalty was inaccurate, likely driven by education.
Contrasts with job accessibility: Job accessibility remained stable while high-EA CDs lost their premium and low-EA CDs areas gained value, showing the education premium reversal.
Provides baseline context for crime results: Heterogeneous effects by initial crime conditions mirror our tercile patterns.
The education reversal appears to be the dominant structural shift, with other characteristics showing stability (jobs) or modest weakening (density and transit). This suggests pandemic housing dynamics re-calibrated a longstanding urban economic geography.
Limitations and Conclusion
While this analysis provides clear evidence of a post-pandemic reversal, several limitations inform the results. First, CD level data may obscure location-specific variations, such as gentrification within Low EA areas. Second, residents in high EA CDs may have held onto their properties; thus, low growth may have resulted from a lack of available real estate, masking retention premiums. Finally, as the data extends only through 2023, it is unclear if these findings are indicative of temporary disruptions or longstanding, permanent shifts, considering recent employer mandates calling for employees to return back to physical work locations.
Despite these limitations, this research reveals pre- and post-COVID shifts reversed the relationship between neighborhood EA and property values in NYC. The pandemic not only disrupted NYC property value logic, it reshaped buyer preferences. Consequently, affordability pressures created a new urban landscape where the traditional high EA clusters, a once historical driver of property value, are less significant with demand shifts toward value offered by outer-borough CDs.
References
Data Sources
NYC Department of City Planning. NYC Community Districts (nycd_25c) shapefile (ZIP).
https://s-media.nyc.gov/agencies/dcp/assets/files/zip/data-tools/bytes/community-districts/nycd_25c.zipNYC Open Data (Socrata). Dataset resource
64uk-42ks(CSV endpoint).
https://data.cityofnewyork.us/resource/64uk-42ks.csv?$select=bbl,cd&$limit=5000000NYC Open Data (Socrata). Dataset resource
64uk-42ks(CSV endpoint).
https://data.cityofnewyork.us/resource/64uk-42ks.csv?NYC Department of Finance. Rolling Sales (annualized sales landing / directory).
https://www.nyc.gov/assets/finance/downloads/pdf/rolling_sales/annualized-salesNYC Department of Finance. Definitions of property assessment terms.
https://www.nyc.gov/site/finance/property/definitions-of-property-assessment-terms.page
Methods and Technical References
The Effect Book. “Difference-in-Differences” (parallel trends section).
https://theeffectbook.net/ch-DifferenceinDifference.html#untreated-groups-and-parallel-trendsColumbia University Mailman School of Public Health. Difference-in-difference estimation.
https://www.publichealth.columbia.edu/research/population-health-methods/difference-difference-estimationCRAN (R). areal vignette: Areal weighted interpolation.
https://cran.r-project.org/web/packages/areal/vignettes/areal-weighted-interpolation.htmlInvestopedia. Nonparametric Statistics.
https://www.investopedia.com/terms/n/nonparametric-statistics.asp
Background and Context Readings
Stanford SIEPR. Donut Effect: How COVID-19 Shapes Real Estate.
https://siepr.stanford.edu/publications/policy-brief/donut-effect-how-covid-19-shapes-real-estateFreddie Mac. Migration and Housing Demand (Research note, PDF).
https://www.freddiemac.com/research/pdf/202206-Note-Migration-08.pdfCNN. Remote work and the housing market.
https://www.cnn.com/2023/09/03/homes/remote-work-housingUSA Today. Workers defy return-to-office mandates.
https://www.usatoday.com/story/money/2025/08/27/work-from-home-workers-defy-rto-mandates/85821219007/NYC Comptroller. New York: A City of Diverse Neighborhoods (report link as cited in document).
https://comptroller.nyc.gov/reports/NYC Future. Boosting college attainment.
https://nycfuture.org/research/boosting-college-attainmentPSC-CUNY (Clarion). Barriers to college attainment.
https://psc-cuny.org/clarion/2021/february/barriers-college-attainment/
Peer Project Pages
Brinson, Madison. STA 9750 Individual Report.
https://madisonbrinson.github.io/STA9750-2025-FALL/individual_report_final.htmlLi, Kelly. STA 9750 Final Project.
https://kelmli.github.io/STA9750-2025-FALL/final_proj.htmlNg Li, Tiffany. STA 9750 Individual Report.
https://tiffany-ngli.github.io/STA9750-2025-FALL/Individual%20Report.htmlSocoy, Jonathan. STA 9750 Final Project.
https://socoyjonathan.github.io/STA9750-2025-FALL/final_project.html
